Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling

نویسندگان

  • Raphaël Rubino
  • Tommi A. Pirinen
  • Miquel Esplà-Gomis
  • Nikola Ljubesic
  • Sergio Ortiz-Rojas
  • Vassilis Papavassiliou
  • Prokopis Prokopidis
  • Antonio Toral
چکیده

This paper presents the machine translation systems submitted by the Abu-MaTran project for the Finnish–English language pair at the WMT 2015 translation task. We tackle the lack of resources and complex morphology of the Finnish language by (i) crawling parallel and monolingual data from the Web and (ii) applying rule-based and unsupervised methods for morphological segmentation. Several statistical machine translation approaches are evaluated and then combined to obtain our final submissions, which are the top performing English-to-Finnish unconstrained (all automatic metrics) and constrained (BLEU), and Finnish-to-English constrained (TER) systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Abu-MaTran at WMT 2016 Translation Task: Deep Learning, Morphological Segmentation and Tuning on Character Sequences

This paper presents the systems submitted by the Abu-MaTran project to the Englishto-Finnish language pair at the WMT 2016 news translation task. We applied morphological segmentation and deep learning in order to address (i) the data scarcity problem caused by the lack of in-domain parallel data in the constrained task and (ii) the complex morphology of Finnish. We submitted a neural machine t...

متن کامل

Abu-MaTran at WMT 2014 Translation Task: Two-step Data Selection and RBMT-Style Synthetic Rules

This paper presents the machine translation systems submitted by the AbuMaTran project to the WMT 2014 translation task. The language pair concerned is English–French with a focus on French as the target language. The French to English translation direction is also considered, based on the word alignment computed in the other direction. Large language and translation models are built using all ...

متن کامل

Edinburgh's Syntax-Based Systems at WMT 2015

This paper describes the syntax-based systems built at the University of Edinburgh for the WMT 2015 shared translation task. We developed systems for all language pairs except French-English. This year we focused on: translation out of English using tree-to-string models; continuing to improve our English-German system; and source-side morphological segmentation of Finnish using Morfessor.

متن کامل

Explorer Edinburgh ' s Syntax - Based Systems at WMT 2015

This paper describes the syntax-based systems built at the University of Edinburgh for the WMT 2015 shared translation task. We developed systems for all language pairs except French-English. This year we focused on: translation out of English using tree-to-string models; continuing to improve our English-German system; and source-side morphological segmentation of Finnish using Morfessor.

متن کامل

Abu-MaTran: Automatic building of Machine Translation

We present the current status of Abu-MaTran (http://www.abumatran.eu), a 4-year project (January 2013–December 2016) on rapid development of machine translation for underresourced languages. It is funded under Marie Curie's Industry-Academia Partnerships and Pathways 2012 programme. This is a consortium-based project with 5 partners (4 academic and 1 industrial).

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015